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A hypothesis regarding the development of imitation learning is presented that is rooted in 
intrinsic motivations. It is derived from a recently proposed form of intrinsically motivated 
learning (IML) for efficient coding in active perception, wherein an agent learns to perform 
actions with its sense organs to facilitate efficient encoding of the sensory data. To this 
end, actions of the sense organs that improve the encoding of the sensory data trigger an 
internally generated reinforcement signal. Here it is argued that the same IML mechanism 
might also support the development of imitation when general actions beyond those of 
the sense organs are considered: The learner first observes a tutor performing a behavior 
and learns a model of the the behavior's sensory consequences. The learner then acts 
itself and receives an internally generated reinforcement signal reflecting how well the 
sensory consequences of its own behavior are encoded by the sensory model. Actions 
that are more similar to those of the tutor will lead to sensory signals that are easier to 
encode and produce a higher reinforcement signal. Through this, the learner's behavior is 
progressively tuned to make the sensory consequences of its actions match the learned 
sensory model. I discuss this mechanism in the context of human language acquisition and 
bird song learning where similar ideas have been proposed. The suggested mechanism 
also offers an account for the development of mirror neurons and makes a number of 
predictions. Overall, it establishes a connection between principles of efficient coding, 
intrinsic motivations and imitation. 



Keywords: intrinsic motivation, imitation, efficient coding, active perception, language development, bird song, 
mirror neuron, perceptual fluency 



1. INTRODUCTION 

Imitation is a powerful form of learning where an agent acquires a 
skill from observing the skill being performed by a second agent. 
This can dramatically speed up the learning of useful behaviors 
compared to random exploration (Miller and Dollard, 1941). In 
the animal learning literature, imitation has been defined as "the 
copying of a novel or otherwise improbable act or utterance, 
or some act for which there is clearly no instinctive tendency" 
(Thorpe, 1963), but many other more or less stringent definitions 
exist. Many authors reserve the term imitation to situations where 
the behavior in question is not yet in the behavioral repertoire of 
the imitating agent (Clayton, 1978), but assessing the behavioral 
repertoire of an animal is in itself problematic. In the following, I 
will simply use imitation as an umbrella term for various forms of 
social learning and highlight important distinctions in the context 
of specific examples. 

Despite many years of research, the origin and development of 
imitation abilities in animals and humans are still poorly under- 
stood (Heyes, 2001). While some theories have proposed that 
the ability to imitate relies on sophisticated innate mechanisms 
(Meltzoff and Moore, 1997), other accounts have emphasized 
the role of generic learning mechanisms for the development of 
imitative behaviors (Miller and Dollard, 1941; Gewirtz, 1969). 
Recent learning accounts considering possible underlying neu- 
robiological mechanisms have rested on associative (Hebbian) 



learning (Heyes and Ray, 2000; Keysers and Perrett, 2004) or 
reinforcement learning (Triesch et al., 2007). These are sufficient 
for the development of a simple form of imitation also called 
response facilitation, where the agent learns to map the observa- 
tion of a behavior performed by a second agent onto an already 
existing motor representation for performing the same behav- 
ior. This motor representation could already be present at birth 
or have been learned previously through random exploration of 
movement possibilities, often referred to as babbling. Importantly, 
however, these accounts have difficulties explaining the devel- 
opment of what is sometimes called true imitation, where the 
to-be-learned skill is not yet in the behavioral repertoire of the 
developing agent. This is the much more difficult and interest- 
ing case, because it addresses how imitation could accelerate the 
acquisition of novel skills. 

An important example is speech acquisition, where the infant 
learns to produce utterances from her native language based on 
interactions with her caregivers. Infants are capable of statistical 
learning and readily discover statistical patterns of their native 
language, but also the social interaction with caregivers is criti- 
cal for normal development of speech, see Kuhl (2004) for review. 
A closely related case is the acquisition of songs in certain species 
of song birds. This learning has been related to human language 
learning (Maiier, 1970; Doupe and Kuhl, 1999) and is used as a 
model system for it. As early as 1773 it was shown that birds learn 
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their song(s) from experience during development (Barrington, 
1773). For example, male juvenile zebra finches usually learn 
to sing a song that closely resembles that of their father. The 
learning proceeds in two phases. During a first phase of purely 
sensory learning, the juvenile bird is suspected to form an audi- 
tory template of the father's (or other social tutor's) song (Baptista 
and Petrinovich, 1984; Konishi, 2010). During a second phase 
of sensory-motor learning, the bird learns to produce a song to 
match the learned template. Depending on the species, the sen- 
sory and sensory-motor phases may or may not overlap. Presently 
it is still unclear through what precise mechanisms the juvenile 
bird manages to better and better approximate the father's song. 
Here I discuss how a recently proposed intrinsically motivated 
learning (IML) mechanism for efficient coding in active percep- 
tion might be generalized for this form of imitation learning. 
This suggests that principles of efficient sensory coding may be 
a foundation for song learning in birds and speech acquisition in 
humans. 

Intrinsic motivations have recently come into focus as impor- 
tant driving forces in the development of complex behaviors 
(Baldassarre and Mirolli, 2013). While there is still much debate 
about the correct definition of intrinsic motivations (Baldassarre, 
201 1), the term is usually used when referring to behaviors such as 
play or other "curious" exploration of the environment that seem 
unrelated to any immediate "extrinsic" goal such as the acquisi- 
tion of food. This hypothesis article does not propose any specific 
computational model nor does it present any empirical results. It 
is merely discussing the new hypothesis in the context of existing 
work. In the following, I briefly review a recently proposed form 
of IML for efficient sensory coding in active perception. Then I 
show how a generalization of this mechanism may account for the 
development of imitative behaviors. This also suggests a mecha- 
nism for the development of mirror neurons. Finally, I discuss 
predictions that the proposed mechanism makes. 

2. INTRINSICALLY MOTIVATED LEARNING FOR EFFICIENT 
CODING IN ACTIVE PERCEPTION 

The efficient coding hypothesis posits that sensory systems strive 
to encode sensory information in an efficient manner by exploit- 
ing the statistical structure and redundancies present in the 
sensory data (Attneave, 1954; Barlow, 1961). Since its first formu- 
lation, numerous aspects of sensory coding have been successfully 
explained in this context. This includes research on how early 
visual representations can be understood as adaptations to the 
statistics of natural images (Simoncelli and Olshausen, 2001) 
as well as related findings in the auditory (Smith and Lewicki, 
2006) and olfactory (Perez-Orive et al., 2002) modalities. While 
this research program has been highly successful, it has typically 
neglected the active nature of perception. In particular, the statis- 
tics of sensory signals are a result of both the natural environment 
and the organism's behavior. This implies that the behavior of 
the organism and in particular the movement of the sense organs 
could be utilized to make the encoding of sensory information 
more efficient. 

Along these lines and inspired by previous work from 
Schmidhuber (2009) proposing compression progress as an 
objective for IML, Zhao et al. (2012) have recently presented a 



model that learns to efficiently encode visual input from two eyes, 
see Figure 1A. Their approach proposes a form of IML using 
an internally generated reinforcement signal for learning efficient 
coding strategies in active perception. The method works as fol- 
lows: A sensory model learns to encode sensory data, while a 
reinforcement learner generates actions of the sense organs that 
help the agent to encode the sensory data efficiently. To this end, 
an internally generated reinforcement signal is given to the rein- 
forcement learner that reflects how well the sensory model is able 
to encode the input. 

In the context of binocular vision Zhao et al. (2012) have 
shown that this mechanism elegantly explains the joint devel- 
opment of an efficient representation for stereo disparity in 
the sensory model and an accurate controller for vergence eye 
movements. In this setting, the system discovers that it is useful 
(intrinsically rewarding) to verge both eyes onto a common phys- 
ical point, because then the sensory model is able to encode the 
data more efficiently. This is because the images from both eyes 
become more redundant and their joint encoding by the sensory 
model becomes more accurate. We may think of this in terms of 
the affordance concept. The observation of a certain disparity at 
the center of gaze is found to afford a certain vergence command 
that will lead to an improved representation of this input. 
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FIGURE 1 | The recently proposed intrinsically motivated learning 
architecture for efficient coding in active perception (A) also gives rise 
to the development of imitation (B). (A) The learning architecture 
comprises an efficient coding model for the sensory input and an 
intrinsically motivated reinforcement learning mechanism for generating 
behavior. In the example of Zhao et al. (2012), the efficient coding model 
learns a sparse code for binocular images, while the reinforcement learner 
generates vergence eye movements. To this end, it receives from the 
sensory coding model a representation of the sensory input (thin arrow) 
and an internally generated reward signal reflecting how well the sensory 
model could encode the binocular input (thick arrow). Both the sensory 
coding model and the reinforcement learner try to optimize the encoding of 
the data. The system discovers that the input data can be encoded most 
efficiently when vergence commands are used to minimize binocular 
disparity. (B) The learner acquires an efficient encoding of speech signals 
provided by a tutor (big mouth). When the learner starts babbling (small 
mouth), the resulting acoustic signals are encoded by the sensory model 
that has been tuned to the tutor's speech. Signals that are easy to encode 
for the sensory model because the utterance sounds similar to the tutor's 
speech will produce a high reinforcement signal. Through this, the system's 
utterances are progressively driven to approximate the tutor's speech. 
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Importantly, the learning of the sensory model and the eye 
movement control develop jointly in this approach, driven by 
the identical objective of encoding the data efficiently. This 
mechanism has been shown to lead to fully autonomous and 
self-calibrating development of binocular vision and has been val- 
idated on a real robot (Lonini et al, 2013). More recently, it has 
also been extended to the development of smooth pursuit eye 
movements. Whether this approach can be extended to actions 
beyond eye movements is still an open question. 

The central assumption of this approach is the existence of an 
internally generated reinforcement signal that encourages move- 
ments of the sense organs leading to an improved encoding of 
the sensory stimulus. Research on perceptual fluency supports the 
plausibility of this assumption. It has been found that the ease 
of processing of a sensory stimulus is related to positive affect 
(Reber et al, 1998). Assuming that the ease of processing reflects 
the quality of encoding of the stimulus by the sensory model, 
then easy to encode stimuli should produce positive affect. This 
positive affect may be due to the proposed internally generated 
reinforcement signal. 

One point requires some discussion, however. Simply trying to 
behave such that the incoming sensory signals are encoded most 
easily might drive the agent to more or less abolish sensory input. 
In the case of visual perception, the agent could simply close the 
eyes or stare at a blank wall. This would make the sensory sig- 
nals be encoded most easily, but is of little use otherwise. There 
are several ways to avoid this. A first solution is to introduce 
a separate mechanism for selecting what the agent will look at, 
while the described IML mechanism ensures that how the target 
object is being looked at is most efficient. For example, an atten- 
tion mechanism selects what object in the scene should be looked 
at, while the proposed IML mechanism ensures that this partic- 
ular object is well represented through vergence, smooth pursuit, 
and possibly other eye, head, and body movements. At the same 
time, it provides an optimized sensory encoding of the stimulus 
by properly taking into account the statistics of the sensory sig- 
nals resulting from these movements. A second solution to the 
problem is to measure the ease of encoding of the sensory data 
in relation to some notion of the complexity of the data or the 
amount of information it contains. For example, the sensory sig- 
nals resulting from staring at the blank wall may indeed be easy 
to encode (e.g., lead to a low reconstruction error of a genera- 
tive model), but they may contain very little information to start 
with. There are various ways of making these notions mathemat- 
ically precise, but the details are not important for the present 
paper. 

Having introduced the recently proposed IML mechanism for 
efficient coding in active perception, we are now ready to con- 
sider its connection to imitation learning, which will require us 
to generalize it from movements of the sense organs to other 
motor acts. 

3. HOW INTRINSICALLY MOTIVATED LEARNING FOR 
EFFICIENT CODING MAY SUPPORT IMITATION 

The mechanism for IML in active perception discussed above 
could also lead to the development of a form of imitation learn- 
ing, as illustrated in Figure IB. Consider the example of an infant 



faced with the problem of acquiring speech by imitating the utter- 
ances of her caregivers (or that of a juvenile song bird learning the 
father's song). Let's assume that at a certain point in development 
the infant has already learned a reasonably good sensory repre- 
sentation of what her native language sounds like Kuhl (2004). 
This representation will continue to improve with age and expe- 
rience. When the infant vocalizes, her utterances will be processed 
by her own auditory system, which has already been tuned toward 
the sounds and words of her mother tongue. According to the 
IML mechanism described above, utterances that sound more like 
her mother tongue will be more easily encoded by her auditory 
system, which will lead to the generation of a higher reinforce- 
ment signal compared to utterances that sound dissimilar from 
her mother tongue. Thus, over time, the infant will adapt her 
utterances to the language she is exposed to driven by her intrin- 
sic motivation to behave in such a way that the sensory data are 
encoded easily for her auditory system. Importantly, this suggests 
that language specific information could enter the babbling pro- 
cess early on, with each utterance being evaluated in the light of 
already learned sensory representations. We will return to this 
point in the Discussion. 

An important question in this context is how the sensory 
model will learn to encode the caregiver's speech and when exactly 
the infant's speech will be easy to encode for the sensory model. 
The caregiver's utterances will necessarily sound different from 
the infant's utterances due to the different structure of their vocal 
tracts. For example, it is not lcear why the sound of a certain 
vowel produced by the infant with her vocal tract should be easy 
to encode for her auditory system, if this has been tuned to speech 
of her caregiver, whose vowels will generally differ in fundamental 
frequency and other parameters. For the case of vowel acquisition 
in the context of infant caregiver interactions, it has been argued 
that an automirroring bias can overcome this difficulty Ishihara 
et al. (2009); Miura et al. (2012). 

3.1. RELATIVE TIMING OF SENSORY AND MOTOR LEARNING 

For The proposed IML mechanism it may be maladaptive for the 
learner to produce utterances at an excessive rate right after birth. 
If a sensory representation properly reflecting the correct target 
language (or song) is not acquired first, then the learner's auditory 
representation may become tuned to or even dominated by its 
own utterances. According to the proposed IML mechanism, the 
learner would then find rewarding whatever it is producing itself. 
This could potentially slow down learning of the native language. 
Enforcing a sufficient amount of passive exposure to the language 
may avoid this problem. 

Similarly, reducing plasticity in sensory areas at the end of a 
critical period and before the onset of vocalizations may also alle- 
viate this problem, because it prevents the sensory representation 
from becoming dominated by the sensory consequences of the 
agent's own actions. 

An alternative solution to the problem would be to reduce 
or switch off sensory learning during one's own vocalizations. 
Instead, the auditory feedback could be used to train a forward 
model that predicts the auditory feedback based on an effer- 
ence copy of the motor signals. Note that an accurate forward 
model allows planning and off-line learning without the need for 
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producing actual motor output and observing the consequences. 
This can dramatically speed up learning (Sutton and Barto, 1998) 
and could even happen during sleep. 

3.2. LEARNING ONE THING OR MANY? 

As discussed above, the absence of sensory input might be par- 
ticularly easy to encode for the sensory system. This might lead 
the infant to not vocalize at all. Several solutions are conceiv- 
able. First, as suggested above the quality of encoding of the 
sensory model could be relative to the complexity of the sen- 
sory input or the amount of information contained in it. In this 
way, the situation of not babbling at all could be made compara- 
tively undesirable. Second, a mechanism reinforcing the learning 
of novel cause and effect relationships or the "discovery of novel 
actions" (Redgrave and Gurney, 2006) could foster varied bab- 
bling. Third and maybe most obviously, the infant may want to 
communicate. 

The question remains what and how many different things 
might be acquired through this IML mechanism. Note that while 
some bird species only learn a single song that "crystallizes" dur- 
ing development, others learn thousands of utterances during 
their life time (Catchpole and Slater, 2003) as do humans. If 
the sensory model allowed for only a single song "template" to 
be stored, this might explain why only a single song is learned. 
If, however, the sensory model had a high capacity for storing 
many acoustic patterns with high fidelity, then a large repertoire 
of actions would be learned with this mechanism. In general, for 
any kind of sensory model there will be a trade-off: given a fixed 
storage capacity more patterns can only be stored at the cost of 
storing them with smaller fidelity. Such differences could con- 
tribute to the varied vocabulary sizes in different species of song 
birds. 

3.3. CONTEXT DEPENDENCE 

The mechanism described thus far will allow an agent to learn to 
imitate a range of utterances or behaviors whose sensory conse- 
quences match those of its learned sensory model. In the simplest 
case, however, all of these behaviors will appear equally "good" in 
any situation, i.e., what vocalization is performed would not nec- 
essarily depend on the current context. This could lead to behav- 
iors being produced in inappropriate contexts. How could the 
agent learn to generate a certain behavior only in the appropriate 
context? 

One solution is certainly through instrumental learning. If, say, 
the behavior has undesirable consequences in the present con- 
text, its execution may be made less probable because of this. 
A second solution to the problem is that during learning of the 
sensory model, contextual information is also integrated into the 
representation. Thus, the model will not be a purely sensory 
model anymore but a sensory-plus-context model. Specifically, 
if during the sensory-only phase of development, the infant or 
the song bird hears an utterance only in a specific context, then 
the developing sensory-plus-context model may encode this rela- 
tionship. Thereby, if the learner generates the behavior in the 
same context, this will be particularly easy to encode for the 
sensory-plus-context model. Conversely, if the behavior is pro- 
duced in a different context, this will be less easy to encode for the 



sensory-plus-context model, because there is a mismatch between 
the context and the sensory input. Obviously, relevant contexts 
are also perceived based on sensory, e.g., visual information. Thus 
a strict separation of sensory information and context may not 
always be possible. Interestingly, the context could be the pres- 
ence of a certain object to which the infant pays attention. In 
this case, an initial association between the visual appearance 
of the object, it's acoustic label, and the motor representation 
for generating the acoustic label can be established. In this sit- 
uation, the presense of the object would afford producing the 
object's name. 

4. DEVELOPMENT OF MIRROR NEURONS 

Mirror neurons are a class of neurons first observed in the pre- 
motor cortex of monkeys (Gallese et al., 1996) whose defining 
characteristic is that they can be activated if the monkey observes 
another agent performing a certain behavior or if the monkey 
plans and executes the same behavior. Because of this, they have 
been implicated in action understanding, imitation, empathy and 
language acquisition (Rizzolatti and Arbib, 1998; Gallese et al., 
2004; Rizzolatti and Craighero, 2004). While originally discovered 
in monkeys, there is converging evidence for a mirror neuron sys- 
tem in humans (Iacoboni et al, 1999) and song birds (Prather 
et al., 2008). While the question how mirror neurons could sup- 
port imitation has received much interest (Iacoboni et al., 1999; 
Iacoboni, 2005, 2009), comparatively little work has investigated 
how mirror neurons develop ontogenetically and what learning 
processes drive this development (Heyes, 2010). 

Complementary mechanisms have been proposed for the 
development of mirror neurons based on generic learning prin- 
ciples. The most popular one is that mirror neurons develop 
through associative learning mechanisms such as Hebbian learn- 
ing (Heyes and Ray, 2000; Keysers and Perrett, 2004; Heyes 
et al, 2005; Catmur et al, 2007; Cooper et al, 2013). A second 
mechanism is that mirror neurons could develop through reward- 
dependent (instrumental, reinforcement) learning (Triesch et al., 
2007). We will take a look at both mechanisms before describ- 
ing a new one based on IML for efficient coding, which combines 
aspects of the other two. 

4.1. HEBBIAN DEVELOPMENT OF MIRROR NEURONS 

Hebbian accounts works as follows (Heyes and Ray, 2000; Keysers 
and Perrett, 2004; Del Giudice et al, 2009). In the case of behav- 
iors whose sensory consequences are easily observed such as 
seeing one's own reaching movement or hearing one's own utter- 
ances, it is assumed that Hebbian learning forms associations 
between simultaneously active sensory and motor representations 
for already learned skills. As a result, neurons involved in the 
execution of a specific behavior receive strong excitatory connec- 
tions from neurons representing its sensory consequences and 
vice versa. When another agent is then observed performing the 
same action, the same sensory representations will be triggered 
due to their ability to generalize to similar sensory stimuli. It 
has been argued that such generalization ability may stem from 
maturational constraints of the visual system Nagai et al. (2011). 
The activated sensory representation then excites the correspond- 
ing motor representation via the associative connections learned 
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through the Hebbian mechanism. Through this the motor rep- 
resentation has obtained mirror properties: it is activated by 
planning or executing a behavior and by merely observing it in 
another agent. 

The situation is more difficult for behaviors where the agent 
cannot fully perceive the sensory consequences of its actions as 
in the generation of facial expressions. For such "opaque" cases it 
is assumed that the agent learns to imitate by first being imitated 
by another agent — usually the caregiver. For example, when an 
infant smiles and his mother imitates the smile, the infant can 
learn to associate the visual representation of the mother's smiling 
face with her own motor representation for smiling. Again, the 
motor representation assumes mirror properties due to Hebbian 
learning. While overall the account appears plausible, a limitation 
is that it only develops mirror representations for skills that have 
already been learned. The learning of novel behaviors is left to 
random exploration which is very inefficient when many motor 
degrees-of-freedom are involved as is the case in speech or song 
production, i.e., when learning takes place in a high-dimensional 
space. 

4.2. REWARD-DRIVEN DEVELOPMENT OF MIRROR NEURONS 

In the reward-based learning account, the agent discovers that 
performing a certain behavior is useful whenever it sees another 
agent perform this behavior. For example, when a developing 
monkey observes a conspecific grasping a peanut from a source, 
the resulting sensory representation can become associated with 
the monkeys own motor plan for grasping a peanut from the same 
source, which is inherently rewarding — especially when hungry. 
Note that this mechanism does not require the ability to observe 
the sensory appearance of one's own action, but only whether 
it leads to a positive, i.e., reinforcing outcome. Circumstantial 
evidence for the importance of reward-driven learning in the 
development of mirror neurons comes from a recent finding that 
mirror neurons in monkey premotor area F5 are modulated by 
the value the monkey assigns to a grasped object (Caggiano et al., 
2012). 

The reward-driven account was studied in greatest detail in the 
context of gaze following, where an agent learns to look where 
others are looking. This is an example of a behavior where the 
sensory appearance of the behavior cannot be observed while 
the agent performs it. Triesch et al. (2007) proposed a compu- 
tational model for the development of gaze following and showed 
that it produced mirror neurons for looking behaviors. It also 
explained various other aspects of the development of gaze fol- 
lowing (Jasso et al, 2012). The existence of mirror neurons was 
the central prediction of the model and it was later confirmed 
neurophysiologically (Shepherd et al., 2009). 

Interestingly, the reward-driven learning mechanism also pre- 
dicts the possibility of generalized mirror neurons (Triesch et al., 
2007). An agent may discover that it is useful to perform some 
action A whenever another agent is observed performing an 
action B. Gaze following represents a simple example of this: 
when two agents face each other, proper gaze following requires 
the learning agent to turn the head to his left if the model 
is observed turning the head to its right. Thus, not the phys- 
ical appearance of the movement matters, but the goal of the 



action: where should I look? Through the reward driven learn- 
ing mechanism an association can be learned from the sensory 
representation corresponding to the observation of the other 
agent performing action B and one's own motor representation of 
action A. This would lead to generalized mirror neurons for which 
the observed action triggering them is not necessarily identical to 
the action being generated. 

4.3. INTRINSICALLY MOTIVATED DEVELOPMENT OF MIRROR 
NEURONS 

The proposed IML mechanism integrates ideas from the Hebbian 
and the reward-based accounts. Like the Hebbian mechanism, 
it requires that the sensory consequences of the actions can be 
perceived. The development of mirror neurons could proceed 
along the following steps. (1) During sensory-only learning, a 
sensory model of various behaviors produced by the tutor is 
learned. Associated with this model, we assume that there will be 
populations of neurons specific to the perception of these differ- 
ent behaviors. (2) During the sensory-motor phase, the learner 
acquires motor representations that produce the same sensory 
consequences by virtue of the proposed IML mechanism. This 
involves the learner's reward system, but the reinforcement signals 
are internally generated. In the end, specific motor representa- 
tions and the associated populations of neurons will code for 
specific behaviors. (3) Since these motor representations trigger 
specific sensory consequences, Hebbian learning mechanisms can 
establish a bidirectional association between the motor represen- 
tation and the sensory representation. Through this, the sensory 
representation will acquire some motor properties and the motor 
representation will acquire some sensory properties. The clear 
distinction between sensory and motor representations dissolves 
and neurons with mirror properties develop: They are active 
when their sensory representation is triggered during observation 
of the behavior of another agent and during planning and exe- 
cution of the corresponding behavior. Note that, the three steps 
could also overlap in time. 

The computational benefit of the IML mechanism over the 
Hebbian mechanism is that the discovery of new skills is not left to 
random exploration, but occurs under guidance from the sensory 
model. Exploration is focused on those behaviors that produce 
similar sensory consequences as the behavior of conspecifics. The 
computational advantage over the reward-based mechanism is 
similar. The discovery of new skills does not require an external 
reward such as the peanut in the above example, but guarantees 
that matching one's behavior with that of a conspecific is intrin- 
sically rewarding. This seems to better reflect the true nature of at 
least human imitation. 

5. DISCUSSION 

I have described how a recently proposed mechanism for IML 
for efficient coding in active perception can be generalized to 
support imitative learning. In addition, a corresponding account 
for the development of mirror neurons was presented. It com- 
bines previous proposals based on associative Hebbian learning 
and instrumental or reinforcement learning in the framework 
of IML. These mechanisms represent parallel pathways through 
which mirror neurons can be acquired. Once established through 
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either of these mechanisms, it is easy to see how mirror neu- 
rons could contribute to various forms of imitation including 
automatic imitation (Heyes, 2010) and vocal mimicry. 

The IML mechanism proposed here is compatible with many 
previous theoretical accounts and computational models of song 
bird learning. A full review of these works is beyond the scope of 
this article. Existing works typically assume that a reinforcement 
signal is derived from matching auditory feedback to a stored 
sensory template (Doya and Sejnowski, 1995; Troyer and Doupe, 
2000). Here I have proposed that such a reward signal could be 
derived from an evaluation of how well the auditory feedback is 
encoded by a sensory model. This distinction is admittedly sub- 
tle, but it connects the present approach to theories on efficient 
coding and sparse coding models as we have used in our work on 
the role of the same IML mechanism in active perception (Zhao 
et al., 2012; Lonini et al., 2013). This maybe important, since neu- 
ral representations in certain parts of the song system are known 
to be very sparse (Hahnloser et al., 2002). 

The examples of human language acquisition and bird song 
learning are special in that the sensory consequences of the behav- 
ior are readily perceived. Obviously, the proposed mechanism can 
be extended to other actions that are easily perceived such as man- 
ual actions. For other actions such as facial expressions, this is not 
straight forward (unless a mirror is available). Learning to imitate 
facial expressions may require other mechanisms such as being 
imitated by caregivers (Heyes, 2001) or rely on reinforcement 
learning mechanisms and social feedback. 

The presented mechanism is rooted in the efficient coding 
hypothesis. As such, it somewhat downplays the importance of 
social feedback during speech and song acquisition. But the social 
context in which learning takes place is known to play a very 
important role both in human language acquisition and bird song 
learning Goldstein et al. (2003); Kuhl et al. (2003). In the words 
of Goldstein and Schwade (2008): "infants' prelinguistic vocaliza- 
tions, and caregivers' reactions to those immature sounds, create 
opportunities for social learning that afford infants knowledge of 
phonology." 

The proposed IML mechanism also shares some aspects of 
previous work on imitation in the developmental robotics litera- 
ture. For instance, (Gaussier et al, 1998) and (Andry et al., 2001) 
propose a robot where a mechanism of "cognitive homeosta- 
sis" would give rise to imitative behaviors. Due to a "perceptual 
ambiguity" the robot may mistake an optic flow field caused by 
observing a moving agent with the flow field produced by its 
own locomotion. The homeostasis drive would try to minimize 
the mismatch between the sensory input stream and the robot's 
motor commands such that the robot will start moving. This is 
suggested to lead to an immediate following behavior. They then 
present experiments with a real robot that has a different prewired 
following mechanism. It learns to store extended sequences of 
movements resulting from following another robot or a human 
if these sequences lead to a reward. In our case, imitation does 
not emerge from a drive to reduce the mismatch between sensory 
percepts and own motor commands or from a prewired following 
mechanism but from a reinforcement signal that favors move- 
ments whose sensory consequences can be encoded efficiently by 
the sensory system. 



Kaplan and Oudeyer (2007) have considered an intrinsic moti- 
vation for maximizing learning progress and discussed its poten- 
tial role in the development of imitation. After illustrating how 
an intrinsic motivation for learning progress allows an agent 
to tackle progressively more difficult learning problems by dis- 
covering "progress niches," they speculate that such an intrinsic 
motivation may also contribute to the development of imitation. 
Specifically, they argue that "(1) the meaningful distinctions nec- 
essary for the development of imitation (self, others and objects 
in the environment) may be the result of discriminations con- 
structed during a progress-driven process and that (2) imitative 
behavior can more generally be understood as a way of producing 
actions in order to experience learning progress." They specu- 
late that at different stages of development infants may engage 
in different kinds of imitative behaviors because they maximize 
the infant's current learning progress. Here we argue that imita- 
tive behaviors are reinforced because their sensory consequences 
can be encoded efficiently by the learner's sensory model. 

How could the proposed IML mechanism be tested experi- 
mentally? In the context of human language learning, it suggests 
that the babbling process might already reflect some aspects of the 
statistical properties of the language to which the infant has been 
exposed. This in turn predicts that the babbling process of infants 
could be shaped by carefully controlling their language input. For 
example, we may speculate that when caregivers intuitively reply 
to babbling attempts by uttering "close" words from the target 
language, they will affect the infant's sensory model in such a way 
that the correct pronunciation of the "close" word is reinforced 
during future babbling attempts. In contrast, replying to infant's 
babbling attempts with arbitrary different-sounding words will 
not produce this effect. Other aspects of child-directed speech 
such as hyperarticulation are also thought to aid the infant in 
learning a sensory model of the target language (Kuhl et al., 1997). 
More research is needed to investigate if and how infants' babbling 
is shaped by their developing sensory model of the target language 
through internally generated reinforcement signals. 

In the context of bird song learning, the IML mechanism could 
be tested most directly by recording from reward circuits in the 
song bird brain as the animal is learning its song. The most obvi- 
ous and direct prediction is that utterances sounding more similar 
to the father's song will generate a higher reward signal because 
they are easier to encode for the bird's auditory system, while 
utterances sounding dissimilar from the father's song will gen- 
erate a lower reward signal because they are harder to encode. 
By manipulating the auditory feedback the bird is receiving, the 
causal role of this sensory feedback in learning can be tested. Note, 
however, that disentangling whether a stronger reinforcement sig- 
nal is due to an easier encoding of the sensory signals or a greater 
similarity of the auditory feedback to a stored template may be 
difficult. To this end, it may be important to consider song bird 
species learning many different songs. 

Next to testing the proposed mechanism and its possible neu- 
ral implementation in biological experiments, it will also be 
interesting to apply the idea in the context of robots. For example, 
future work could try to exploit the proposed IML mechanism 
for language learning in robots. This will help to identify possible 
limitations or inconsistencies of the approach. The experiences 
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gained would help to further develop and refine the current pro- 
posal. In conclusion, it is intriguing that the venerable principle 
of efficient sensory coding may play a central role in sophisticated 
cognitive phenomena such as imitation and language acquisition. 
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